How AI Can Help in the Diagnostic Dilemma of Pulmonary Nodules

Simple Summary Pulmonary nodules are considered a sign of bronchogenic carcinoma, detecting them early will reduce their progression and can save lives. Lung cancer is the second most common type of cancer in both men and women. This manuscript discusses the current applications of artificial intelligence (AI) in lung segmentation as well as pulmonary nodule segmentation and classification using computed tomography (CT) scans, published in the last two decades, in addition to the limitations and future prospects in the field of AI. Abstract Pulmonary nodules are the precursors of bronchogenic carcinoma, its early detection facilitates early treatment which save a lot of lives. Unfortunately, pulmonary nodule detection and classification are liable to subjective variations with high rate of missing small cancerous lesions which opens the way for implementation of artificial intelligence (AI) and computer aided diagnosis (CAD) systems. The field of deep learning and neural networks is expanding every day with new models designed to overcome diagnostic problems and provide more applicable and simply used models. We aim in this review to briefly discuss the current applications of AI in lung segmentation, pulmonary nodule detection and classification.


Introduction
Lung cancer screening is a very important issue as the disease is the second most common type of cancer in both males and females. Lung cancer is responsible for 25% of all cancer cases in USA [1]. It is obvious that early detection was associated with a higher 5-year survival rate. Risk factors for developing lung cancer include all types of smoking (even electronic cigarettes and passive smoking) [2][3][4], family history either of single or multiple relatives especially those who developed cancer at young age [5], chronic obstructive lung disease [6], and human papilloma virus [7]. Recently, the United States Preventive Services Task Force recommended annual screening for lung cancer with low dose computed tomography (LDCT) for asymptomatic individuals aged 55 to 80 years who have a 30-pack year smoking history and currently smoke or have quit smoking within the past 15 years. Patients who have stopped smoking for 15 years, have a co-existing health problem limiting life expectancy, or are not candidates for surgical resection are excluded from annual screening. The algorithm of screening includes the number, the density, and size of solid, part solid or non-solid component of the nodules and according to these parameters, a follow-up schedule was designed [8,9]. Artificial intelligence was invented to enhance the computational abilities of computers and teach them to think, solve problems, and perform tasks in the same way as human beings. Recently, medical image analysis and diseases prediction and detection are among the most exciting applications of artificial intelligence. Using artificial intelligence techniques, computer aided diagnosis (CAD) systems have been developed and used in the analysis of medical imaging and have proved to be very helpful tools. AI techniques could be used to create a proper learning model to be used in clinical practice for lung cancer screening. The learning model should consist of four main steps; lung segmentation, followed by nodule segmentation/detection, then feature analysis, and the exclusion of false positive nodules (see Figure 1). Classification of detected pulmonary nodules into benign and malignant is based upon a preset of characteristic features including shape analysis, estimation of growth rate, and appearance analysis [10][11][12]. In this review, we will briefly discuss the current applications of AI in lung segmentation and pulmonary nodule detection and classification. This study reviews recent CT-based studies as well as studies published in the last two decades.

Lung Segmentation
The first step in almost every CAD system dealing with lung disease is the segmentation. In this step, a preferred structure is delineated from its surrounding prior to analysis. Lung segmentation is very challenging due to different existing structures with near-similar densities such as the bronchi, bronchioles, pulmonary artery, and vein branches. Lung segmentation techniques can be categorized into four main categories based on: (1) Hounsfield unit (HU) threshold, (2) deformable boundaries, (3) shape models, (4) region/edge-based models, in addition to machine learning (ML) based methods and hybrid techniques which utilize a combination of methods to overcome the drawbacks of using single method ( Figure 2). Details of the different categories are given below.
Hounsfield unit (HU) thresholding: Normal lung parenchyma displays low HU and appears hypodense in thoracic CT scan images in contrast to other structures such as heart, blood vessels or bronchial walls. Researchers tried to determine a threshold of HU to define lung parenchyma using different methods. Hu et al. [13] proposed a 3-step technique to perform lung segmentation. Their method started with extracting lung parenchyma utilizing a proper grey scale threshold. Then, separation of right and left lungs was performed using dynamic program. Lastly, a series of morphological operations were used to refine the pulmonary margins. This method was further used in the works of Ukil and Reinhardt [14], as well as Van Rikxoort [15]. Amato et al. [16,17] used grey scale thresholding once to extract the thorax from surrounding structures, and another time for extracting the lung from the rest of thoracic structures. A rolling ball algorithm is applied to lung periphery aiming not to miss any juxta-pleural nodule and exclude partial volume pixels. Pu et al. [18] designed an adaptive border marching (ABM) algorithm to reach the same purpose through refining lung margins. Gao et al. [19] proposed a 4-step method to separate the pulmonary vessels, and airways from lung parenchyma as well as separating right and left lungs based on a grey scale threshold. Other researchers used more sophisticated methods to define threshold used for lung extraction such as histogram analysis [20], and 3D fuzzy adaptive thresholding [21]. Limitations of lung segmentation using thresholding method are mainly related to its reliance on image resolution and type of scanners used (i.e., GE, Philips. . . ). Another important issue is that there might be an overlap between densities of different lung structures making differentiation based on HU difficult. Deformable boundary models: The second method used for lung segmentation is deformable boundary models including snakes, active contours, and level sets. These models start with an initial point then follow the shape of the desired structure influenced by internal and external forces. Itai et al. [22] utilized a 2D parametric deformable model to extract lung from computed tomography (CT) image using lung borders as an external guiding force. Silveria et al. [23,24] presented a technique that uses active contour and Level sets. They begin with a thresholding technique, then edge detection is initiated using a robust geometric active contour model around the lung. It divides into two and continues by multiple strokes which are categorized into valid and invalid according to confidence degrees. The major limitation of deformable boundary models is the high sensitivity of the selection of the initial point, in addition to inhomogeneity of lung structure that may lead to unsuccessful adaptation of lung boundaries [25].
Shape-based models: In this method, the stored data in the CAD system is used to improve the accuracy of lung segmentation. It utilizes either a statistical shape or lung appearance model. Unlike previously discussed methods, this approach of lung segmentation is more effective in dealing with lungs with moderate to severe pathology and with variations in lung anatomy as it gets benefit from trained models [26]. Sun et al. [27] proposed a 2-step lung segmentation technique that used a robust active shape model (RASM) matching method to segment the outline of the lungs guided by rib cage detection method, followed by using an optimal surface finding approach that was created by Li et al. [28] to fit the initial segmentation result to the lung. The right and left lungs were segmented separately. Sofka et al. [29] designed a multistage learning model that used predefined anatomical data to initiate a statistical shape model. Hau et al. [30] developed a graph-based search algorithm via cost function that takes into consideration the intensity, gradient, boundary smoothness, and rib anatomical information. Other researchers proposed a user interface framework [31] or Bayesian classification refined by Markov Gibbs Random Field (MGRF) method [32][33][34]. Similar approach was introduced by Chung et al. [35] who developed a Bayesian approach based on the Chan Vese (CV) model [36], where the data obtained from previous or upper frame image was used to predict lung image. False positive juxta-pleural nodule candidates were excluded via concave points detection and circle/ellipse Hough transform. Modification of lung contour by adding the final nodule candidates to the area of the CV model was the final step. More recently, Sun et al. [37] presented a new active shape model (ASM) algorithm to detect the outlier marker points by distance method aiming to get better assessment of lung periphery and juxta-pleural lung nodules. They also used a robust principal component analysis (RPCA) of low rank theory to remove noise from images in order to construct ASM. Despite the many advantages of shape model over other lung segmentation methods, its main limitation depends on the accurateness of the used stored data [25].
Region-based method: The main idea of region-based segmentation is that neighboring pixels in a certain region will have similar values [38]. An example of this method is the region growing method. If one pixel showed similar criteria to a predefined set then it is included in that region [38][39][40][41][42]. Other examples include watershed segmentation [43], random walks segmentation [44], graph cuts segmentation [45], and fuzzy connectedness [46] . This method of segmentation is suitable for homogenous structures such as lungs with no or mild pathology, airway and pathologic lesions with homogeneous density [25].
Machine learning-based methods: This method uses learning models composed of predefined measurable characteristics (called features) to identify normal and abnormal lung regions as well as different anatomical structures and finally construct the proper lung segmentation. Small image patches are labelled either as normal, abnormal, or neighboring soft tissue. The most common pathological patches used in clinical practice include consolidation, ground glass opacities, and fibrosis. A supervised training process uses data systems to extract features from each pixel/voxel and further classify them to predict lung field boundaries and reach final segmentation. A proper lung segmentation should include identification of both normal and pathological lung regions in the same process, and this is performed via examining each voxel in the CT image [47][48][49][50][51]. Multiple sophisticated algorithms were developed to reach this task, for example, Mansoor et al. [52] designed an ML algorithm that identifies a large spectrum of pulmonary pathologic lesions combined with region-based and neighboring anatomy guided correction segmentation. Obviously, this method is computationally expensive, but its remarkably high accuracy along with development of parallel computing and efficient well-processed workstations make this method feasible in clinical practice. One of the limitations of this method is that it uses small image patches which makes it impossible to predict structural information such as global shape of the lung. It is impossible to get feature data sets that can fit anatomical and physiologic lung variations in different subjects. Lastly, pixel by pixel assessment was the reason that this method had the least efficiency as compared to the other four major classes of lung segmentation [51,[53][54][55][56].
Hybrid approaches of lung segmentation: No single lung segmentation method could fit with anatomical and pathological variants alone, this encouraged the development of combined approaches. As in the works of Mansoor et al. [52] and Hau et al. [30].
In summary, the literature reviews of lung segmentation system using these four different categories are presented in Table 1. Table 1. Literature reviews of lung segmentation system using Hounsfield unit (HU) threshold, deformable boundaries, shape models, region/edge-based models, or machine learning (ML) based methods.

Method # Subjects System Evaluation
The Dice similarity coefficient (DSC) and mean absolute surface distance of the system were 97.5% ± 0.6% and 0.84 ± 0.23, respectively.

Pulmonary Nodule Detection and Segmentation
Lung cancer screening programs rely mainly on early detection of pulmonary nodules utilizing LDCT [71][72][73][74][75][76][77]. LDCT is capable of providing imaging of the thoracic region of high contrast, temporal, and spatial resolution in a very short acquisition time (single breath hold). However, detection of lung nodules is not as simple as it looks, as pulmonary nodules usually appear as a white spherical structure that could mimic a nearby small blood vessel or a collapsed bronchiole. In addition, the inter-reader variations in detection and the characterization of pulmonary nodules are merely subjective issues [10,78,79]. This opens the way for artificial intelligence and deep learning to overcome human errors and provide more effective procedures. The process of lung nodule detection passes into two stages; first detection of the pulmonary nodule candidates, second exclusion of the false positive nodules (FPN) and keeping only the true positive nodules (TPN). In other words, detection followed by classification [10,78,79].
Computer-aided diagnosis (CAD) systems: A large public database was generated to provide data that can be used to assess the performance of CAD detection and diagnostic systems and help further development. It is called the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI). The creation of this database required great efforts as CAD was not used in annotation of images included [80]. Other databases such as data derived from the Dutch-Belgian NELSON lung cancer screening trial and LUNA16, LIDC, DSB2017, NLST, TianChi, and ELCAP datasets were utilized by most of the current research works dealing with CAD and deep learning (DL) [81]. The first step in the process of nodule detection is to unsharp the CT images by changing the image threshold which improves discrimination of pulmonary nodules from the rest of the surrounding lung parenchyma. A series of 3D cylindrical and spherical filters and template matching were used to detect small lung nodules [82][83][84][85][86][87][88][89]. However, the geometry of the candidate nodules doesn't always fit into these spherical, cylindrical, or circular assumptions as it may be spiculated by its nature or due to attachment to nearby pleural surface or blood vessel [90]. Other studies proposed methods to detect lung nodules using k-means clustering technique [91][92][93] with further utilization of rule-based classifiers and linear discriminate analysis (LDA) to eliminate normal lung structures and reduce FPN. One study tried to solve the problem of eliminating an overlapping or contacting blood vessel by choosing a proper region of interest (ROI) in a 3-step model [94]. On the other hand, Oda et al. [95] and Siata et al. [96] used 3D algorithms; 3D filter by orientation map of gradient vectors and 3D distance transformation to overcome the same problem. Brown et al. [97] used prior patient images to create a specific model, so that any change in size and morphology of pulmonary nodules could be detected in follow up images easily. Messay et al. [98] used a fully automated CAD system that utilizes intensity thresholding and morphological operations to detect pulmonary nodules with a sensitivity of 82.66% with 3 FPN/scan. A set of 245 features was computed for each segmented lung nodule and Fisher Linear Discriminant (FLD) classifier was utilized. Similarly, Setio et al. [99] designed a CAD system to detect pulmonary nodules larger than 10 mm. They also used a multi-stage process of thresholding and morphological operations, then the extracted nodules were segmented and a set of 24 features was computed, finally the nodules were classified via a radial based vector supporting machine (VSM). A recent study aimed to solve the problem of using uncertain class data through the application of a CAD system based upon semi-supervised extreme learning machines (SS-ELM). This was done by using both certain class feature sets with labels, and unlabeled feature sets for training [100].
Deep learning: Deep learning is an advanced type of machine learning that uses complicated algorithms to model high level features and recognize characteristics. It is composed of statistical models that predict results depending on previous training on annotated or un-labelled datasets [101]. The algorithm could predict the presence of pulmonary nodule or predict its nature whether benign or malignant [102]. Convolutional neural network (CNN) is one of the most commonly used DL algorithms in the clinical practice. It was originally implemented in LeNet, which was designed by Yann LeCun et al. [103]. Since then, it gained more popularity and outperformed the existing state of the art texture analysis and support vector machine(SVM) methods. CNN model can build itself from the beginning even when dealing with new un-labelled features without the need for predefined set of features or complex human led pipes, in contrast to tissue radiomics or feature analysis. Another advantage of CNN over other methods is that all its components reach ultimate level at the same time, while in the case of tissue radiomics for instance, there is no guarantee that all components will fulfill high level. Additionally, it requires limited human supervision [10,104,105]. In the last decade, several research works emerged with different CNN algorithms and models designed for pulmonary nodule detection.
Two studies showed exceptionally high accuracy (99-96.6%), sensitivity (97.5-96.9) and specificity (97.5-96.3). They proposed algorithms that either combined 2D and 3D artificial neural networks with intensity based statistical features [106] or used CAD system for different dimensions of angular histograms of surface normals (AHSN) features [107]. Other researchers used 2D and 3D subsets of features [108], local shape analysis and data-driven local contextual feature learning [109], geometric and intensity statistical features [110], or deep neural networks (DNN) [111]. Bergtholdt et al. [112] found that using support vector machine classifier improved the accuracy, sensitivity, and specificity of pulmonary nodule detection. One study [113] used deep believe network (DBN) to detect large nodules (>30 mm) with high accuracy of about 90%. Jakobs et al. [114] compared the performance of two commercial and one academic state of the art CAD systems and found that the updated commercial CAD system (Herakles) had the highest sensitivity reaching 82% with 3.1 FPN/scan. They found that about one third of the missed nodules were subsolid. They recommended the addition of a CAD scheme designed for subsolid nodules to improve the sensitivity of nodule detection. Another recent study reviewed several research works and found high sensitivity of DL algorithms when utilizing LUNA 16 dataset (in the range of 94.4-97%) with an average of 4 FPN/scan and LIDC-IDRI dataset (in the range of 80.06-94.1%) [115].
Pulmonary nodule segmentation: Nodule size is a strong predictor of neoplastic nature along with its progressive increase on follow up [116]. One large study demonstrated that risk of developing cancer in nodule less than 100 mm 3 equals those with no nodules [117]. Nodule size was better assessed through volumetry rather than diameter as 2D measurements were found to be unreliable and showed wide inter and intra-observer variations [118]. Automated 3D measurement of pulmonary nodules provides better assessment of its morphology and growth rate [119]. Accurate nodule volumetry requires good nodule segmentation. Manual segmentation of lung nodules is time consuming and is far less accurate in comparison to deep learning semiautomated methods [120]. Most of the available algorithms concerned with pulmonary nodule detection rely on growing edge method where a predefined threshold acts as a seed that connects all nearby voxels of higher density [121]. As mentioned before, solid pulmonary nodules display higher density than surrounding lung parenchyma promoting easy discrimination by growing edge method, but difficulties occur when a vessel contacts or passes beside a pulmonary nodule or when it approximates the pleura [121,122]. The detection of ground glass nodules with indistinct margins is very problematic in manual segmentation. Tao et al., and Zhou et al., proposed novel methods via a multi-level statically based method [123] and a classifier by boosting k-nearest neighbor (kNN), whose distance measure is the Euclidean distance between the nonparametric density estimates of two regions [124]. Another more recent study segmented subsolid nodule through voxel classification that automatically eliminate blood vessels [125]. Other studies described more complex approaches to segment of pulmonary nodules of different densities and those with either vascular or pleural attachment via analysis of the core of the nodule [79,126,127]. Table 2 presents a summary of the state-of-the-art pulmonary nodule detection and segmentation systems. 10 ground Glass Opacity nodules.
All 10 nodules detected with only 1 false positive nodule.
Dehmeshki et al. [122] Adaptive sphericity oriented contrast region growing on the fuzzy connectivity map of the object of interest.  Thresholding approach based on internal texture (solid/part-solid and non-solid), and external attachment (juxta-plural and juxta-vascular).

Nodule Classification
One of the major limitations of using CAD systems in the detection of lung nodule is the high false positive rate which hinders the accuracy and lowers its efficacy as a screening framework that could be used on a large scale population. False positive nodules are associated with extra costs and hazards as they lead to unnecessary biopsies, more prolonged follow up imaging, and extra worry by patients and their families. So, accurate classification of detected pulmonary nodule is of utmost importance to overcome these problems. After nodule detection and segmentation, comes nodule classification. TPNs are classified by two large architectures: either radiomics feature-based scheme or deep learning models [136][137][138][139] (Figure 3). The feature radiomic scheme uses different sets of features, that could be morphological/shape (spherical disproportion, circularity . . . etc.), texture features, gray scale/histogram features (average, standard deviation, skewness. . . ), gradient features (average, standard deviation, kurtosis. . . ), and spatial features (location of the nodule) [140,141]. The extracted data from image voxels are then gathered and transformed into numeric form called feature radiomics [142]. A group of numeric features (radiomics) represent what is called feature vector. Then, a classifier (which is a machine learning model) differentiates feature vectors according to training algorithms and labelled data [143]. Famous classifiers include support vector machine, and random forest [144]. The advantage of radiomics model is that it could build models of high performance out of limited datasets, yet it requires manual tumor segmentation and hand-crafted feature extraction [145][146][147].
On the other hand, classifiers are used to build end to end convolutional neural networks, fully connected neural network, or deep neural network to reach final nodule classification through semantic feature analysis [12,[147][148][149][150][151]. As mentioned earlier, ML and neural networks do not require segmentation or hand-crafted feature extraction [152,153]. DNN could assess difficult cases which does not fit in the predefined feature characteristics, yet still with satisfactory results. Deep layers such as ResNet and DenseNet are usually used to train the DNN model [69,136,154,155]. The process of nodule classification requires analysis of data obtained from 3D images. However, most of the available models either use 2D data to build a 3D CNN model [156] or a multi-view 2D CNN model [157][158][159]. Uthoff et al. [156] developed a ML pipeline using k-medoids clustering and information theory to pick efficient predictor sets for different amounts of parenchyma. Their method had high sensitivity of 100% and specificity of 96%. On the other hand, Shen et al. [157] used a multiscale 2-layered CNN to diagnose lung cancer in CT chest images, reaching an accuracy of 84.86%, while Jung et al. [160] used a 3D deep convolutional neural network (DCNN) with shortcut and dense connections to classify lung nodules. These connections allow gradients to pass directly and quickly, thus overcome gradient vanishing problems. In addition to acquiring three dimensional features instead of two. Their method had higher competition performance metric (CPM) of about 0.9 as compared to other state of the art methods. Chen et al. [160] used a neural network ensemble (NNE) to evaluate lung nodules and differentiate between probably malignant, uncertain, and probably benign nodules with an accuracy of 78.7%. Another study using texture features and artificial neural networks found that feed forward back propagation showed more accurate nodule classification as compared to feed forward neural networks and that skewness was the most accurate parameter [161]. Kumar et al. [149] proposed another type of neural network for lung nodule classification called stacked autoencoder (SAE) with an accuracy of 75.01%. Wilms et al. [78] presented a model-based 4D segmentation of lungs with large tumors in 4D CT data sets in which a 4D statistical shape model is fitted to the 4D image sequence respecting inter and intra-patient variation. Ardila et al., proposed a DL model that extracts data from patient's prior and current CT images to predict the risk of development of bronchogenic carcinoma [162]. This model had high accuracy when applied on lung cancer screening trial cases and on independent validation group. They compared their results with a group of 6 radiologists. Interestingly, their model was comparable to radiologists in the evaluation of prior and recent CT images, but it outperformed the radiologists when evaluating recent CT image only. Li et al. [163] evaluated the diagnostic performance of a CAD commercial software program called Infer-Read CT Lung Research (ICLR) which was based on 3D CNN. They found that ICLR had high accuracy in risk prediction of bronchogenic carcinoma unlike benign or metastatic lesions. One recent research [164] utilized a 2-level classification of pulmonary nodules into benign and malignant with further subdivision of malignant nodules into serious and mild malignant nodules using CNN with transfer learning, they attained high accuracy similar to other published research.
Other studies were more concerned in correlating between pulmonary nodules morphological features and finger print of genetic mutations of pathological types of lung cancer (radio-genomics). This is particularly important in the assessment of success of gene inhibiting therapy [164][165][166][167][168].

Limitations and Future Prospects
The scale of dataset used in CNN model is a crucial factor in the determination of whether it is a good model for training or not [182]. Collecting a large number annotated images could be a year-long process or even impossible owing to nature of medical imaging. To overcome this problem, large public datasets were introduced. Another solution is to artificially generate datasets that are similar to those used in the training of CNN. One example is the generative adversarial network (GAN) [133]. Another suggested solution is to implement transfer learning. Transfer model and LeNet5 were suggested to deal with conditions where large datasets are not available. Transfer-learning simply uses preexisting data from source task to analyze data obtained from target task, which is useful in situations where target task has few datasets [183]. Recent study used CNN and LeNet5 to classify pulmonary nodules into benign or malignant with further sub-classification of various types of malignancies [184]. A limitation that comes along with data sharing and data transfer is the legal aspects of patient's privacy. Another limitation is the lack of uniform terms between radiologists (for example when to describe a nodule as subsolid or non-solid) or between pathologists (minimally invasive carcinoma or carcinoma in situ), which in turn leads to non-uniform labelling of data which may affect the trained model. Of course, the solution for this problem will be the creation of a data-reporting system to unify medical terms like what happened in BI-RADS and LI-RADS. In the clinical practice, radiologists usually get benefit from clinical data to direct differential diagnosis and reach proper decision. However most of the available algorithms depend only on features derived from the images with little or no consideration to clinical data such as age, presence or absence of risk factors (smoking). Algorithms that combine clinical and imaging data are the solution to such limitation [185]. Finally, many algorithms and models are proposed but they lack generalizability and are used mainly in research works.

Conclusions
AI and its multiple arms including CAD, ML and DL are used to design complex algorithms to detect and further characterize pulmonary nodules in order to predict malignancy risk. Along the last decade, large number of radiomic features and artificial networks were proposed, each had its own advantages and drawbacks, till now no specific method gained popular acceptance to be applied on a general population.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: Hidden conditional random field SCPM-Net sphere center-points matching detection network SD-U-Net Squeeze and attention, and dense atrous spatial pyramid pooling U-Net